{
 "metadata": {
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  },
  "orig_nbformat": 4,
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3.7.10 64-bit ('py37': conda)"
  },
  "interpreter": {
   "hash": "fbea1422c2cf61ed9c0cfc03f38f71cc9083cc288606edc4170b5309b352ce27"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2,
 "cells": [
  {
   "cell_type": "markdown",
   "source": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 回归\n",
    "\n",
    "根据因变量的不同，分成几种回归：  \n",
    "* 连续：多重线性回归(注意与多元线性回归有区别，比如多元自变量是连续的，多重则可以是多种数据类型等)  \n",
    "* 二项分布：logistic回归\n",
    "* poisson分布：poisson回归\n",
    "* 负二项分布：负二项回归\n",
    "\n",
    "## 逻辑回归\n",
    "\n",
    "同线性回归一样，需要求出$n$个参数：\n",
    "\n",
    "$$\n",
    "z=\\theta_0+\\theta_1x+\\theta_2x+...+\\theta_nx=\\theta^Tx\n",
    "$$\n",
    "\n",
    "逻辑回归通过Sigmoid函数引入了非线性因素，可以轻松处理二分类问题:\n",
    "\n",
    "$$\n",
    "h_{\\theta}(x)=g\\left(\\theta^{T} x\\right), g(z)=\\frac{1}{1+e^{-z}}\n",
    "$$\n",
    "\n",
    "与线性回归不同，逻辑回归使用的是交叉熵损失函数:\n",
    "\n",
    "$$\n",
    "J(\\theta)=-\\frac{1}{m}\\left[\\sum_{i=1}^{m}\\left(y^{(i)} \\log h_{\\theta}\\left(x^{(i)}\\right)+\\left(1-y^{(i)}\\right) \\log \\left(1-h_{\\theta}\\left(x^{(i)}\\right)\\right)\\right]\\right.\n",
    "$$\n",
    "\n",
    "其梯度为:\n",
    "\n",
    "$$\n",
    "\\frac{\\partial J(\\theta)}{\\partial \\theta_{j}} = \\frac{1}{m} \\sum_{i=0}^{m}\\left(h_{\\theta}-y^{i}\\left(x^{i}\\right)\\right) x_{j}^{i}\n",
    "$$\n",
    "\n",
    "形式和线性回归一样，但其实假设函数(Hypothesis function)不一样，逻辑回归是:\n",
    "$$\n",
    "h_{\\theta}(x)=\\frac{1}{1+e^{-\\theta^{T} x}}\n",
    "$$\n",
    "\n",
    "其推导如下:\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\frac{\\partial}{\\partial \\theta_{j}} J(\\theta) &=\\frac{\\partial}{\\partial \\theta_{j}}\\left[-\\frac{1}{m} \\sum_{i=1}^{m}\\left[y^{(i)} \\log \\left(h_{\\theta}\\left(x^{(i)}\\right)\\right)+\\left(1-y^{(i)}\\right) \\log \\left(1-h_{\\theta}\\left(x^{(i)}\\right)\\right)\\right]\\right] \\\\\n",
    "&=-\\frac{1}{m} \\sum_{i=1}^{m}\\left[y^{(i)} \\frac{1}{\\left.h_{\\theta}\\left(x^{(i)}\\right)\\right)} \\frac{\\partial}{\\partial \\theta_{j}} h_{\\theta}\\left(x^{(i)}\\right)-\\left(1-y^{(i)}\\right) \\frac{1}{1-h_{\\theta}\\left(x^{(i)}\\right)} \\frac{\\partial}{\\partial \\theta_{j}} h_{\\theta}\\left(x^{(i)}\\right)\\right] \\\\\n",
    "&=-\\frac{1}{m} \\sum_{i=1}^{m}\\left[y^{(i)} \\frac{1}{\\left.h_{\\theta}\\left(x^{(i)}\\right)\\right)}-\\left(1-y^{(i)}\\right) \\frac{1}{1-h_{\\theta}\\left(x^{(i)}\\right)}\\right] \\frac{\\partial}{\\partial \\theta_{j}} h_{\\theta}\\left(x^{(i)}\\right) \\\\\n",
    "&=-\\frac{1}{m} \\sum_{i=1}^{m}\\left[y^{(i)} \\frac{1}{\\left.h_{\\theta}\\left(x^{(i)}\\right)\\right)}-\\left(1-y^{(i)}\\right) \\frac{1}{1-h_{\\theta}\\left(x^{(i)}\\right)}\\right] \\frac{\\partial}{\\partial \\theta_{j}} g\\left(\\theta^{T} x^{(i)}\\right)\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "因为:\n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\frac{\\partial}{\\partial \\theta_{j}} g\\left(\\theta^{T} x^{(i)}\\right) &=\\frac{\\partial}{\\partial \\theta_{j}} \\frac{1}{1+e^{-\\theta^{T} x^{(i)}}} \\\\\n",
    "&=\\frac{e^{-\\theta^{T} x^{(i)}}}{\\left(1+^{-\\theta} T^{T_{x}(i)}\\right)^{2}} \\frac{\\partial}{\\partial \\theta_{j}} \\theta^{T} x^{(i)} \\\\\n",
    "&=g\\left(\\theta^{T} x^{(i)}\\right)\\left(1-g\\left(\\theta^{T} x^{(i)}\\right)\\right) x_{j}^{(i)}\n",
    "\\end{aligned}\n",
    "$$\n",
    "所以:\n",
    "$$\n",
    "\\begin{aligned}\n",
    "\\frac{\\partial}{\\partial \\theta_{j}} J(\\theta) &=-\\frac{1}{m} \\sum_{i=1}^{m}\\left[y^{(i)}\\left(1-g\\left(\\theta^{T} x^{(i)}\\right)\\right)-\\left(1-y^{(i)}\\right) g\\left(\\theta^{T} x^{(i)}\\right)\\right] x_{j}^{(i)} \\\\\n",
    "&=-\\frac{1}{m} \\sum_{i=1}^{m}\\left(y^{(i)}-g\\left(\\theta^{T} x^{(i)}\\right)\\right) x_{j}^{(i)} \\\\\n",
    "&=\\frac{1}{m} \\sum_{i=1}^{m}\\left(h_{\\theta}\\left(x^{(i)}\\right)-y^{(i)}\\right) x_{j}^{(i)}\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "import sys\n",
    "from pathlib import Path\n",
    "curr_path = str(Path().absolute())\n",
    "parent_path = str(Path().absolute().parent)\n",
    "sys.path.append(parent_path) # add current terminal path to sys.path\n",
    "\n",
    "import numpy as np\n",
    "from Mnist.load_data import load_local_mnist\n",
    "\n",
    "(x_train, y_train), (x_test, y_test) = load_local_mnist(one_hot=False)\n",
    "\n",
    "# print(np.shape(x_train),np.shape(y_train))\n",
    "\n",
    "ones_col=[[1] for i in range(len(x_train))] # 生成全为1的二维嵌套列表，即[[1],[1],...,[1]]\n",
    "x_train_modified=np.append(x_train,ones_col,axis=1)\n",
    "ones_col=[[1] for i in range(len(x_test))]\n",
    "x_test_modified=np.append(x_test,ones_col,axis=1)\n",
    "\n",
    "# print(np.shape(x_train_modified))\n",
    "\n",
    "# Mnsit有0-9十个标记，由于是二分类任务，所以可以将标记0的作为1，其余为0用于识别是否为0的任务\n",
    "y_train_modified=np.array([1 if y_train[i]==1 else 0 for i in range(len(y_train))])\n",
    "y_test_modified=np.array([1 if y_test[i]==1 else 0 for i in range(len(y_test))])\n",
    "n_iters=10 \n",
    "\n",
    "x_train_modified_mat = np.mat(x_train_modified)\n",
    "theta = np.mat(np.zeros(len(x_train_modified[0])))\n",
    "lr = 0.01 # 学习率\n",
    "\n",
    "def sigmoid(x):\n",
    "    '''sigmoid函数\n",
    "    '''\n",
    "    return 1.0/(1+np.exp(-x))\n"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 小批量梯度下降法(Mini-batch Gradient Descent)\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "\n",
    "for i_iter in range(n_iters):\n",
    "    for n in range(len(x_train_modified)):\n",
    "        hypothesis = sigmoid(np.dot(x_train_modified[n], theta.T))\n",
    "        error = y_train_modified[n]- hypothesis\n",
    "        grad = error*x_train_modified_mat[n]\n",
    "        theta += lr*grad\n",
    "    print('LogisticRegression Model(learning_rate={},i_iter={})'.format(\n",
    "    lr, i_iter+1))"
   ],
   "outputs": [],
   "metadata": {}
  }
 ]
}